Detailed Provenance Capture of Data Processing
نویسندگان
چکیده
A large part of Linked Data generation entails processing the raw data. However, this process is only documented in human-readable form or as a software repository. This inhibits reproducibility and comparability, as current documentation solutions do not provide detailed metadata and rely on the availability of specific software environments. This paper proposes an automatic capturing mechanism for interchangeable and implementation independent metadata and provenance that includes data processing. Using declarative mapping documents to describe the computational experiment allows automatic capturing of termlevel provenance for both schema and data transformations, and for both the used software tools as the input-output pairs of the data processing executions. This approach is applied to mapping documents described using rml and fno, and implemented in the rmlmapper. The captured metadata can be used to more easily share, reproduce, and compare the dataset generation process, across software environments.
منابع مشابه
Semantic Provenance for Science Data Products: Application to Image Data Processing
A challenge in providing scientific data services to a broad user base is to also provide the metadata services and tools the user base needs to correctly interpret and trust the provided data. Provenance metadata is especially vital to establishing trust, giving the user information on the conditions under which the data originated and any processing that was applied to generate the data produ...
متن کاملStart Smart and Finish Wise: The Kiel Marine Science Provenance-Aware Data Management Approach
While creating or processing scientific data, it is very important to capture and to archive the corresponding provenance data. “Start smart and finish wise” is our approach for a provenance aware tooling, which helps data managers and scientists not only to manage their data, but also to capture their scientific data in the field, to record the provenance data, to store it for further analysis...
متن کاملThe First International Workshop on the role of Semantic Web in Provenance Management
A challenge in providing scientific data services to a broad user base is to also provide the metadata services and tools the user base needs to correctly interpret and trust the provided data. Provenance metadata is especially vital to establishing trust, giving the user information on the conditions under which the data originated and any processing that was applied to generate the data produ...
متن کاملData Provenance and Financial Systemic Risk
We describe the needs for data provenance in a large-scale analytic environment to support financial systemic risk analysis. Government financial regulators need to make sense of the outputs of thousands to tens of thousands of simulation runs invoked by a large analytic staff; automatic capture of data provenance (dataset sources and processing steps) supports analysts without adding to their ...
متن کاملData Quality Challenges in Empirical Software Engineering: An Evidence-Based Solution
Empirical software engineering data sets are characterized by data quality problems such as noise, outliers, missing data and redundancy. In this paper I propose to address these and other data quality challenges by developing and employing a provenance software tool that is able to explain and replay data capture and processing activities, and to inform the development of appropriate preventiv...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017